Nonlinear Speech Features for the Objective Detection of Discontinuities in Concatenative Speech Synthesis
نویسندگان
چکیده
An objective distance measure which is able to predict audible discontinuities in concatenative speech synthesis systems is very important. Previous results showed that linear approaches are not very effective to detect audible discontinuities. The best result was obtained by using the Kullback-Leibler distance on power spectra with the rate of 37%. In this paper, we present two nonlinear approaches for the detection of discontinuities. The first method is based on a nonlinear harmonic model for speech while the second method is based on the demodulation of speech in an amplitude and a frequency component using the Teager energy operator. Results show that detection rate can exceed 70%, which is an improvement of about 95% over previous published results.
منابع مشابه
On the Detection of Discontinuities in Concatenative Speech Synthesis
Last decade considerable work has been done in finding an objective distance measure which is able to predict audible discontinuities in concatenative speech synthesis. Speech segments in concatenative synthesis are extracted from disjoint phonetic contexts and discontinuities in spectral shape and phase mismatches tend to occur at unit boundaries. Many feature sets —most of them of spectral na...
متن کاملمراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی
Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...
متن کاملPerceptual and objective detection of discontinuities in concatenative speech synthesis
Concatenative speech synthesis systems attempt to minimize audible signal discontinuities between two successive concatenated units. An objective distance measure which is able to predict audible discontinuities is therefore very important, particularly in unit selection synthesis, for which units are selected from among a large inventory at run time. In this paper, we describe a perceptual tes...
متن کاملIRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES A Hybrid Text-to-Speech System that Combines Concatenative and Statistical Synthesis Units
Concatenative synthesis and statistical synthesis are the two main approaches to text-to-speech (TTS) synthesis. Concatenative TTS (CTTS) stores natural speech features segments, selected from a recorded speech database. Consequently, CTTS systems enable speech synthesis with natural quality. However, as the footprint of the stored data is reduced, desired segments are not always available in t...
متن کاملObjective distance measures for spectral discontinuities in concatenative speech synthesis
The quality of unit selection based concatenative speech synthesis mainly depends on how well two successive units can be joined together to minimise the audible discontinuities. The objective measure of discontinuity used when selecting units is known as the join cost. The ideal join cost will measure perceived discontinuity, based on easily measurable spectral properties of the units being jo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004